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1^ The source coding problem with action-dependent side information at the decoder has recently 

been introduced to model data acquisition in resource-constrained systems. In this paper, an efficient 
f-H algorithm for numerical computation of the rate-distortion-cost function for this problem is proposed, 

c/3 and a convergence proof is provided. Moreover, a two-stage code design based on multiplexing is put 

o 

forth, whereby the first stage encodes the actions and the second stage is composed of an array of 
classical Wyner-Ziv codes, one for each action. Specific coding/decoding strategies are designed based 

o 

on LDGM codes and message passing. Through numerical examples, the proposed code design is shown 
\Q to achieve performance close to the lower bound dictated by the rate-distortion-cost function. 



o 

m 

> 

• 1—1 

X 



Index Terms 

Rate-distortion theory, side information "vending machine", Blahut-Arimoto algorithm, code design, 
LDGM, message passing. 

I. Introduction 

The source coding problem in which the decoder can take actions that affect the availability or 
quality of the side information at the decoder was introduced in (H. The problem generalizes the 
well-known Wyner-Ziv set-up and can be used to model data acquisition in resource-constrainted 
systems, such as sensor networks. In the model studied in [lj, each action is associated a cost 
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and the system design is subject to an average cost constraint. The information-theoretic analysis 
of the problem was fully addressed in [1]. In this paper, instead, we tackle the practical open 
issues, namely the computation of the rate-distortion-cost function and code design. 

Specifically, the rate-distortion-cost function for the source coding problem with action-dependent 
side information was derived in [lj. However, no specific algorithm was proposed for its com- 
putation. A first contribution of this paper is to propose such an algorithm by generalizing the 
classical Blahut-Arimoto (BA) approach, which was introduced for the Wyner-Ziv problem in 
0. Convergence of the algorithm is also proved. 

Moreover, while the theory in [1J demonstrates the existence of coding and decoding strategies 
able to achieve the rate-distortion-cost bound, practical code constructions have not been inves- 
tigated yet. It is recalled that, for classical lossy source coding problems, codes that have been 
able to achieve rate-distortion bound include Low Density Generator Matrix (LDGM) codes 0, 
polar codes Hi and trellis-based quantization codes 0. For the Wyner-Ziv problem, efficient 
codes include compound LDPC/LDGM codes (61 and polar codes flU. A second contribution of 
this paper is hence the study of code design for source coding problems with action-dependent 
side information. As shown in 0], optimal codes for this problem have a successive refinement 
structure, in which the first layer produces the action sequence and the refinement layer uses 
binning to leverage the side information at the decoder. Here, we first observe that a layered 
code structure in which the refinement layer uses a multiplexing of separate classical Wyner-Ziv 
codes, one for each action, is optimal. This allows us to simplify the code structure with respect 
to the successive refinement strategy in [1]. LDGM-based codes with message passing encoding 
are designed and demonstrated via numerical results to perform close to the rate-distortion-cost 
function. 

The paper is organized as follows. In Section |n} the action-dependent source coding problem is 
described and results from flU are summarized. In Section [Till we describe the proposed algorithm 



for computation of the rate-distortion-cost function, and in Section IV a practical code design 
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Fig. 1. Source coding with action-dependent side information. 

is proposed. Finally, in Section [VJ we present numerical results for a specific example. 
A. Notation 

Throughout this work, we let upper case, lower case and calligraphic letters denote random 
variables, values and alphabets of the random variables, respectively. For jointly distributed 
random variables, P x (x), Px\y{x\u) an ^ Px,y( x ^v) denote the probability mass function (pmf) 
of X, the conditional pmf of X given Y and the joint pmf of X and Y. To simplify notation, 
the subscripts of the pmfs may be omitted, e.g., P(x\y) may be used instead of Px\y(x\y). 
The notation X n represents the tuple (Xi,X 2 ,. . . ,X n ), and [a,b] where a,b £ Z with a < b 
denotes the set of integers {a, a + 1, . . . , b — 1, b}. Moreover, Z + = {0, 1, . . .}, N = Z + \ {0} 
and l| con d} denotes the indicator function, and is one when cond is true, and zero otherwise. 
The notation |_-J and [•] denotes the floor and ceiling operators, respectively. 

II. Background 

In this section, we recall the definition of source coding problems with action-dependent side 
information and review the rate-distortion-cost function obtained in (TJ. 

A. System Model 

The source coding problem with action-dependent side information introduced in [QQ| is il- 
lustrated in Fig. [TJ In this problem, the source X n E X n is memoryless and each sample is 
distributed according to the pmf P x . At the encoder, the encoding function 

f : X n ^ [l, [2 nR \] , (1) 

January 29, 2013 DRAFT 



maps the source X n into a message Me [l, |_2 ni \|] > where i? denotes the rate in bits per sample. 
At the decoder, an action sequence A n G A n is chosen according to an action strategy 



g: [l,[2 nR \] -+A n , 



(2) 



which maps the message M into an action sequence A n . Based on A n , the side information Y n G 
y n is conditionally independent and identically distributed (iid) according to the conditional pmf 
Py\x,a so that we have 

n 

P Y n\x™,A"(y n \x n ,a n ) = Y\_PY\x,A(yi\xi,ai). (3) 

i=i 

The decoder makes a reconstruction X n G X n of X n according to the decoding function 



h : [1, [2 nR \] x y n -)■ A" 1 , 



(4) 



which maps message M and side information F™ into the estimate X n . 

The action cost function A (a) : A — > R + is defined such that A (a) =0 for some a G A and 
A max = max ag _4 A(a) < oo, and the distortion function d(x, x) : X x X — > R + is defined such 
that for each x G X there is an x G A' satisfying d(x,x) = 0. The rate-distortion-cost tuple 
(i?, -D, C) is then said to be achievable if and only if, for all e > 0, there exist an encoding 
function /, an action function g and a decoding function h, for all sufficiently large n G N, 
satisfying the distortion constraint 



E 



i=l 



< n(D + e) 



(5) 



and the action cost constraint 



E 



i=i 



< n{C + e). 



(6) 



The rate-distortion-cost function, denoted as R(D,C), is defined as the infimum of all rates R 
such that the tuple (R, D,C) is achievable. 
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Fig. 2. Optimal encoder for source coding problems with action-dependent side information. 



B. Rate-Distortion-Cost Function 

The rate-distortion-cost function R(D,C) was derived in [0Q and is summarized below. 

Lemma 1. ([1, Theorem 1]) The rate-distortion-cost function for the source coding problem 
with action-dependent side information is given as 

R(D, C) = min/(X; A) + I(X; U\Y, A), (7) 

Px,y,a,u(x, V, a, u) = P x (x)P u \x(u\x)l{ v ( tU ) =a }P Y \xAy\ x y a )> ( 8 ) 

and the minimization is over all pmfs Pu\x and deterministic functions r\ :U — > A under which 
the conditions 

E[d(X, X opt (U,Y))} < D, (9) 

and 

E[A(A)} < C (10) 
hold. The function X opt :U x y — > X denotes the best estimate of X given U and Y, i.e., 

X opt (u,y) = argmmE[d(X,x)\U = u,Y = y}. (11) 
Moreover, the cardinality of the set U can be restricted as \U\ < \X\\A\ +2. 
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C. Optimal Coding Strategy 

The proof of achievability of the rate-distortion-cost function in [0Q shows that an optimal 
encoder has the structure illustrated in Fig. [2] and consists of the following two steps. 

• Action Coding: The source sequence X n is mapped to an action sequence A n . The ac- 
tion sequence is selected from a codebook Ca of about 2 nI ^ x ' A ^ codewords, each type 
approximately equal to Pa- The index B k identifies the selected codeword A n , and hence 
consists of k, approximately equal to nI(X;A), bits. The selection of A n is done with 
the aim of ensuring that A n and X n are jointly typical with respect to the joint pmf 
PxA x > a ) = p A\x{a\x)P x {x). 

• Source Coding: Given the action sequence A n , a source codebook is chosen out of a set of 
around 2 nI( - x ' A ^ codebooks, one for each codeword in Ca- Each codeword U n in the selected 
source codebook has a joint type with A n close to Pa,u, and the number of codewords is 
about 2 nI ^ x ' U \ A \ The source sequence is mapped to a sequence U n taken from the selected 
codebook with joint type Pa,u and with the objective of ensuring that X n , A n and U n are 
jointly typical with respect to the joint pmf Px,a,u(x, a , u). Each source codebook is divided 
into around 2 n/ ( X;C/ l j4 ' y ) subcodebooks, or bins, in order to leverage the side information at 
the receiver using Wyner-Ziv decoding. 

The message M is given by the concatenation of the bits B k and B ka and thus the overall rate 
of the action code and the source codes is given by |7]). Upon receiving the message M from the 
encoder, the decoder first reconstructs the action sequence A n . The action sequence is used to 
measure the side information Y n . As A n is known, the decoder also knows the source codebook 
from which U n is selected, and U n is then recovered by using Wyner-Ziv decoding based on 
the side information Y n . In the end, the final estimate X n is obtained as Xi = X op \Ui,Yi) for 
i E [l,n\. 
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III. Computation of the Rate-Distortion-Cost Function 

In this section, we first reformulate the problem in (|7j) by introducing Shannon strategies. This 
result is then used to propose a BA-type algorithm for the computation of the rate-distortion-cost 
function ([7]). 

A. Shannon Strategies 

We first observe that, from Lemma [TJ it is sufficient to restrict the minimization to all joint 
distributions for which A is a deterministic function A = rj(U). Moreover, the final estimate of 



X in ( fTTj ) is a function of both U and Y. Based on these facts, we define a Shannon strategy 
T G T C X\ y \ x A as a vector of cardinality \y\ +1, in which the first \y\ elements are indexed 
by the elements in y and T{y) E X for y E y, and the last element is denoted a(T) G A We 
also define the disjoint sets T a = {t G T : &(t) = a} for all actions a E A. The rate-distortion- 
cost function (|7]) can be restated in terms of the defined Shannon strategies as formalized in the 
next proposition. 

Proposition 1. Let T E T C X^ x A denote a Shannon strategy vector as defined above. The 
rate-distortion-cost function in ([7]) can be expressed as 

R(D,C) = min/LY;a(T)) + I(X;T\Y,&(T)), (12) 

where the joint pmf Px,y,t is of the form 

p x ,yAx, y, t) = p x (x)PT\x(t\x)p Y \A,x(yH t )^)^ (13) 

and the minimization is over all pmfs Pt\x under the constraints 

E[A(A)]= J2 Px(x)P T \x(t\x)A(a(t)) < C (14) 

t£T,x£X 

and 

E[d(X, T(Y))] = P x,y,t(x, y, t)d(t(y),x) < D. (15) 

teT,xex,yey 
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Moreover, the cardinality of the alphabet T can be restricted as |T| < |^||«4.| + 2. 

Proof: Given an alphabet U, a pmf Pjj\x and a function r\ : U — > A, the sum of the two 
mutual informations in (|7]) can be seen to be equal to the sum of the two mutual informations in 
( fT2] ) and the average distortion and cost in (|9]) and ( |T0| ) to be equal to ( [l"5] ) and ( fl4] ), respectively, 



by defining P T |x as follows. For each u E U, define a strategy t with Pt|x(*|^) = 

such that a(£) = 77 (u) and = X opt (w, y) for y G y. ■ 

Remark. The characterization in Proposition [T] generalizes the formulation of the Wyner-Ziv 
rate-distortion function in terms of Shannon strategies given in . 

The following lemma extends to the rate-distortion-cost function R(D, C) some well-known 
properties for the rate-distortion function (see, e.g. 0, JU). This will be useful in the next 
section when discussing the computation of R(D,C). 

Lemma 2. The following properties hold for the rate distortion cost-function R(D,C): 

1) R(D, C) is non-increasing, convex and continuous for D G [0, 00) and C G [0, 00). 

2) R(D,C) is strictly decreasing in D G [0, D max {C)] and R(D max (C),C) = 0, where 

min ^2 Px,Y,T{x,y,t)d(t(y),x), (16) 
T teT,xex,yey 

under the constraint 

E[A(a(T))] = A(a(*))Pr(0 < C. (17) 
teT 



3) For all D G [0, D max (C)], the minimum in (12) is attained when the distortion inequality 



(15) is satisfied with equality. 



Proof: The lemma is proved by the arguments in flU Lemma 10.4.1]. 
B. Computation of the Rate-Distortion-Cost Function 
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Algorithm 1 BA-type Algorithm for Computation of the Rate-Distortion-Cost Function 
input: Lagrange multipliers s < and m < 0. 



output: R(D S)m , C S)m ) with C s>m and D S:7ri as in (fT9|)-(|20|). 

initialize: Pt\x 

repeat 



Compute Qa as in ([25]). 
Compute Qt,y as in ([26]). 

Minimize F(P T \ X , Qt,y, Qa) with respect to P T \ X using Algorithm [2 
until convergence 

P T\X *~ P T\X 



In order to derive a BA-type algorithm to solve the problem in (fT2]), we introduce Lagrange 



multipliers m for the cost constraint in qT4) ) and s for the distortion constraint ([T5]). The following 
proposition provides a parametric characterization of the rate-distortion-cost function in terms 
of the pair (s, m). 



Proposition 2. For each s < and m < 0, define the rate-distortion-cost tuple {R s , m i Ds,rm C s 
via the following equations 



Rs.m — s F) S) m + TTlC, 



+ mm {I (X ; A) + I (X;T\Y,a(T)) - sE [d(X, T(Y))] — mE [A(a(T))]} , (18) 



C s , m = J] Px(x)P^(t|x)A(a(t)), 



(19) 
(20) 



where P?\x denotes a minimizing pmf Pt\x for the optimization problem in ( [18] ). Then, the 
following facts hold 
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1) The tuple (R s>m , D s <m , C Stm ) lies on the rate-distortion-cost function, i.e., 

Rs,m R\D Sjrn jC 's,m) ■ (21) 

2) Every point (R,D,C) on the rate-distortion-cost function for D 6 [0, D max {C)] can be 
written as (fl"8[)-(|2"0]) for s < and m < 0; 

3) The rate-distortion-cost function is given as 

R(D, C) = max (R s , m + s(D - D s , m ) + m{C - C s>m )) ■ (22) 

a<0 
m<0 

Proof: The proposition above follows by strong duality as guaranteed by Slater's condition 
Section 5.2.3], and can also be derived directly as in 0. ■ 
Given the proposition above, one can trace the rate-distortion-cost function by solving problem 



( [18] ) and using ( fT9] ) and pO] ) for all s < and m < 0. Inspired by the standard BA approach, 



we now show that problem ( [18] ) can be solved by using alternate optimization with respect to 
Pt\x and appropriately defined auxiliary pmfs Qt,y and Qa- To do this, we define the function 
F(-) of Pt\x an d auxiliary pmfs Qt,y and Qa as in ( [23] ), 



F(P t \x,Qt,y,Qa) = D kl (P ya \\Qa) - Px,YAx,y,t)]ogPr\xAv\ x >*(t)) 

x£X, y ey,teT 

+ J2 P x( x ) D kl(Py,t\x{;-\x)\\Qt,y)-s Px,Y,T(x,y,t)d(t(y),x) 
xex teT,xex, y &y 



A(a(t))P x (x)P T]x (t\x), (23) 

zT,x£X 

where D KL (P\\Q) denotes the Kullback-Leibler (KL) divergence^] and Px,y,t, Py,t\x and Py,A 



m 

teT,xeX 



are calculated from the joint pmf (13). We then have the following result. 



Proposition 3. For any s < and m < 0, we have 

R(D StTn , C s>m ) = sD Sim + mC Sim + min F(P T \ X , Qt,y, Qa), (24) 

Pt\x ,Qt,y ,Qa 



The Kullback-Leibler divergence (8) is defined as D KL (P\\Q) = P(i) log 2 for pmfs P and Q. 
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with (|T9])-([20]), where the distribution P£\x denotes a minimizing distribution in ( |24| ). Moreover, 



the function F(Pt\x,Qt,y,Qa) is jointly convex in the pmfs Pt\x> Qt,y and Qa- 

Proof: The proof technique for the first part is due to HOl . and is based on showing that 
the pmf Qa minimizing F(-) for fixed Qt,y and Pt\x is 

Q A (a)= Yl Px(x)P T \x(t\x) =P A (a), (25) 

xtX,teT a 

and the pmf Qt,y minimizing F(-) for fixed Qa and P T \x is given by 

Q TtY (t, y) = Y, p x(x)P Y \xAy\x> &(t))P T \ x (t\x) = P T , Y (t, y). (26) 

The convexity of the function F(-) follows from the log-sum inequality J8). ■ 
Based on Proposition |3j the proposed BA-type algorithm for computation of the rate-distortion- 



cost function then consists of alternate minimizing ( [24] ) with respect to Pt\x, Qt,y and Qa 



Due to the convexity of ( [24] ), the algorithm is known to converge to the optimal point similar 
to [0. The proposed algorithm is summarized in Table Algorithm [TJ The step of minimizing 
F{Pt\Xi Qt,Yi Qa) with respect to P T \x is discussed in the rest of this section. 



C. Minimizing F over Pt\x 

To minimize the function F(P T \ X , Qt,y, Qa) with respect to Pt\x f° r fixed Qa and Qt,y, we 
add a Lagrange multipliers for each equality constraints J2teT Pt\x{A x ) = 1 with x e X, and 
resort to the KKT conditions as necessary and sufficient conditions for optimality. This property 
of the KKT conditions follows by strong duality due to the validity of Slater's conditions for 
the problem [9, Section 5.2.3]. We assume Px{%) > without loss of generality, since values 
of x with Px(x) = can be removed from the alphabet X. 
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By strong duality, we obtain the following optimization problem 

min F(P T \ X ,Qa,Qt,y) = 

p t , x >o 

T. tS T P T\x(t\x) = l 

max mm F(P T{X ,Q A , Q TX ) + J2 X x [ J2 P T \x(t\x) - l) . (27) 

In the proposed approach, the outer maximization in p7| ) is then performed using the standard 
subgradient method. The inner minimization is instead performed by finding the stationary points 
of the function. This leads to the system of equalities g a \ x {PA\x, Vx) — Pa\x{<A x ) for a <E A 
and x e X, with 

■ )/' • 1-/3 

9a\x(PA\x,Vx) = P A \x{a\xY — — - — — — 

(28) 

where 

atx = Q A (a(t))2 mA ( a( *^ • -fVi jst,^ (y ]«, ) [sd(t(y) ,«) +log Q r , y (*,?/)] (29) 

t&T a 

and (3 E (0, 1) is a parameter of the algorithm (see Appendix A). 

Proposition 4. The algorithm in Tables Algorithm [7] and Algorithm [2] converges to the rate- 
distortion-cost function R(D s m , C^ m ) for all s < and m < 0. 

Proof: See Appendix |A} ■ 

IV. Code Design 

In this section, we consider the design of specific encoders and decoders for the source coding 
problem with action-dependent side information. The goal is to design codes that perform close 
the rate-distortion-cost function given in Lemma [T] for some fixed pmf in ([8]) (or equivalenty in 
Proposition [T] for some fixed pmf Px,y,t)- 
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Algorithm 2 Algorithm for Minimization of F with respect to Pt\x 
input: Qt,y and Q A . 

output: Pt\x- 

parameters: Subgradient weights 9i — 4,i gZ + and constant (3 E (0, 1). 
initialization: i = 0; fix = I for x E X; P^ x (a\x) = for t E T,x E X. 
repeat 

Perform fixed-point iterations on the system Pa\x{p\%) — 9a\x(PA\x, Hx) for a E A and 
x E X with starting point Pv) x until convergence to obtain P% +> ! 



Update the subgradients as 

= A*? + 4y (l - E^ixVMJ for x G AT. 
i i + 1. 
until convergence 

(a(t)|x). 
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Fig. 3. Code design for source coding problems with action-dependent side information. The illustration is for A = {0, 1}. 



A. Achievability via Multiplexing 



As explained in Section |H-C[ the achievability proof in QJ is based on an action codebook 
Ca for the action sequences A n of about 2 nI ( x ' A * ) codewords and 2 nJ ( x ' A * > source codebooks of 
about 2 n/ ( X;C/ l A ) codewords for the sequences U n , where each source codebook corresponds to 
an action sequence A n . We also recall that binning is performed on the source codebooks in 
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order to reduce the rate. 

Here, we first observe that the code design can be simplified without loss of optimality by using 
the encoder and decoder structures in Fig.[3j Accordingly, as in JT), the action encoder selects the 
action sequence A n , and the corresponding index B k , from the codebook Ca to the decoder, where 
k = \nI(X; A)~\. However, rather than using 2 nI( - x ' A ) source codebooks, we utilize only |.4.| 
source codebooks C SA , a E A. Specifically, the source codebook C SA has about 2 nPA ^ I ^ x '' u]{A=a ^ 
codewords, and each codeword in codebook C a>a has a length of n a = \n(P A (a) + e)] symbols 
for some e > 0. 

To elaborate, as seen in Fig. 3(a)[ after action encoding, which takes place as in JT), the source 



X n is demultiplixed into |*4.| subsequences, such that the a-th subsequence X™ a contains all 
symbols for which A, = a. Therefore, for sufficiently large n, by the law of large numbers, 
the number of symbols in X" a is less than n a with high probability. Appropriate padding is 
then used to make the length of the sequence exactly n a symbols. The a-th subsequence X™ a 
is then compressed using the codebook C sa with the objective of ensuring that X" a and U™ a 
are jointly typical with respect to the pmf Px,u\a{'> '\ a )- Binning is performed on each source 
codebook so that the number of bins is 2 nPA( - a ^ I( - x < u \ Y ' A=a \ The bin index B ka of U™ a is thus 
of k a = \nPA(a)I(X; U\Y 1 A = a)~\ bits. Overall, the rate of the message M, consisting of 
the indices B k for the action code and B ka a for the source codes with a E A, is I(X; A) + 
T,aeA Pa(o)I(X; U\Y, A = a) = I(X; A) + I(X; U\A, Y) as desired. 

At the decoder, as seen in Fig. |3(b)[ the action sequence A n is reconstructed and is used 
to measure the side information Y n . The side information Y n is demultiplexed into |.4.| sub- 
sequences, such that the a-th subsequence F a " a contains all symbols for which A { = a. 
Each of the subsequences U2 a are then reconstructed by using Wyner-Ziv decoding based on 
the message bits B ka and the side information Y™ a , and the reconstructed source subsequences 
X at i are obtained as X aii = X op \U aj i,Y a j) for i 6 [l,n ], where X a ^ denotes the z-th symbol 
of the sequence X" a . Finally, the source reconstruction X n is obtained by multiplexing the 
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subsequences X™ a for a £ A. 

Remark. The proposed code structure also applies to the classical successive refinement problem 
|[TT| and can be used to simplify the code design proposed in |fl2l|. 

B. The Action Code 



Based on the encoder structure in Fig. 3(a) we discuss the specific design of the action 



encoder. The action code Ca has to ensure that the codewords A n approximately have the type 
Pa, and the action encoder must obtain a codeword A n that is jointly typical with respect to 
the joint pmf Px,a- These conditions are satisfied by optimal source codes [4]. Optimal source 
codes can be designed using LDGM codes or polar codes as shown in [fT3l and flU, respectively. 
Here, we adopt LDGM codes as proposed in [fT3l . |fl4|. Specifically, in the following, we define 
an encoder based on message passing. This uses ideas from |[P3l to handle the general alphabet 
and pmf P4, and from IT4ll to implement message passing and decimation. The key difference 
with respect to lfT4l is that there the goal of the encoder is to minimize the Hamming distance, 
while the aim in this paper is to find an action sequence that is jointly typical with the source. 

We use the code described by the factor graph in Fig. |4j The bottom section of the graph is a 
LDGM code (see, e.g. lfT3l ). The sequence B k denotes the message bits with k = \nI(X; A)~\ 
and {g Kt i : k £ [1, d], I £ [1, n]} denote the check variables of the LDGM code, where the choice 
of d is explained later. The objective of the mappings : {0, l} d x A — > {0, 1} for I £ [l,n] 
is to ensure that the types of the codewords, or action variables, are approximately equal to Pa 
lfT3l . Specifically, each mapping ipi applies to the subset of check variables {g K ,i} K e[i,d] and to 
the symbol a/ and is defined in terms of a mapping 0:{O,l} d — > A as 

V , i({fl , K,i}«eM,a) = %({<w} K e[i,d])=a}- (31) 
Following [fT3l . the value of d £ Z + is chosen such that there are integers v a for a £ A satisfying 

5> a = 2 d and PA(a)~^ d . (32) 
aeA 
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Fig. 4. Factor graph defining the action encoder. 



The mapping <p is then arbitrarily chosen such that exactly v a of the 2 d binary sequences 

Given the source sequence X n , the encoder runs the sum-product algorithm with decimation 
as in 031 in order to obtain the message bits B k , and hence the action sequence A n (see flU 
for a discussion of the role of decimation in source coding problems). 



C. The Source Codes 



Based on the proposed encoder structure in Fig. 3(a) the design of each source code C s ^ a for 
a E A is equivalent to optimal codes for classical Wyner-Ziv problems. 

In the special case where X = {0, 1}, and the distortion metric is Hamming, the coding 
problem reduces to the binary Wyner-Ziv problem with Hamming distortion which was studied 
in O, 01- 



V. Numerical Examples 



To exemplify the problems of interest and to demonstrate the tools developed in this paper, we 
consider the source coding problem with action-dependent side information depicted in Fig. [5] 
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and described in the following. Let X e X — [1, K + 1] be a random variable with pmf 

| ^ if x6 [1, K] 
Px{x) = { , (33) 

| q if x = K+l 

for g G [0,1]. The letters 1, . . . ,K denote source outcomes that are relevant for the decoder, 
and thus should ideally be distinguishable by the latter, while the letter x = K + 1 represents a 
source outcome that is irrelevant for the decoder. Examples where this situation arises includes 
monitoring systems in which the decoder wishes to recover the values of a physical quantity 
only when above, or below, a certain pre-determined threshold. To account for this requirement, 
the distortion function is given by 

d(x, X) = l {x ^ x ^ xe [l,K}} (34) 

i.e., the decoder is only penalized if it makes an error when a; is a relevant letter. 

At each time i, the decoder can choose an action Ai e {0, 1}, such that, if Ai = 0, the 
side information is given by Yi = e, where e denotes an erasure symbol, and if Ai = 1, the 
side information is given by Yi = Y^ where Yi is the output of an erasure channel in which 
y = X U {e} and 

p for y = e 
P Y\x(y\ x ) = \ l-p fory = x , (35) 
otherwise 

where p e (0, 1) is the erasure probability. The action cost function A(-) is given by A(a) = 
l{ a =i}> which implies that the cost constraint with < C < 1 enforces that no more than nC 
samples of the side information Y n can be measured by the receiver. 

A. Computation of the Rate-Distortion-Cost Function 

We apply the proposed BA-type algorithm to the described scenario in order to compute the 
rate-distortion-cost function. For reference, we also consider the simplified strategy, in which the 
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Fig. 5. The action-dependent source coding problem. 
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Fig. 6. Computed rate-distortion-cost function R(D, C) for K — 4, erasure probability p £ {0.0, 0.1} and q = |. 



actions are chosen independently of the message M. We refer to the optimal approach discussed 
thus far as "adaptive actions", while labeling as "non-adaptive actions" the simplified class of 
strategies in which the actions are selected independently of the encoder's message (see 0J). 
The performance with non-adaptive actions can be obtained from Proposition [1] by imposing that 
A and X are independent. 

Fig. [6] shows R(D, C) for K = 4, q = | and p G {0, 0.1} with both adaptive actions and non- 
adaptive actions. We see that for the given scenario, we achieve significant gains using adaptive 
actions in comparison to non-adaptive actions. Moreover, the effect of the erasures decreases as 
the action cost decreases due to the reduced availability of the side information at the decoder. 



B. Code Design 

We now turn to the issue of code design for the scenario. We consider the case in which p — 0, 
so that, the measured side information is noiseless and we adopt the code design proposed in 
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Section [TV| We start with some analytical considerations of the rate-distortion-cost function that 



will be useful for designing the codes. By symmetry, the pmf Pa\x can be written as 



P 



A\X 



a\x) 



c-gn 

l-q 



if a = 1 A x E [1,K] 
l=££±m if a = o a x e [1, K\ 
7 \fa=l/\x = K + l 

1 - 7 ifa = 0Ax = K + l 



(36) 



where 7 G 



0, min ( 1 



c 



is a parameter to be determined. The mutual information /(X; A) 
can thus be computed in terms of Pa\x and Px, and the rate-distortion-cost function in (|7]) is 
then obtained via the following optimization problem 



/?( D. C) = min I{X- A) + (1 - C)R ( 

7e[0,min(l,^)J V 1 ^ ( 



, Px\A=0 I j 



(37) 



where R(D,Px) is the classical rate-distortion function of a memoryless source with pmf Px- 
Note that we have used the fact that I(X;U\Y, A = 1) = since Y = X for A = 1. 



From ( |37| ), it is seen that we only need to design an action code and the source code C Sj0 , 
where the latter is a classical rate-distortion code. For the action code, we use the approach 



proposed in Section IV and for the source code we use the related LDGM scheme proposed in 



d. 

We consider the case where q = |, K = 4, which yields d = 2 for both the action code Ca and 
the source code C s ,o- We fix a blocklength of n = 10 000 , yielding LDGM codes of blocklength, 
20 000 . Each point is averaged over 50 source realizations and LDGM codes. For both codes, 
we use the sum-product algorithm with decimation in [14J. As in [fT4l . we use damping after 
30 iterations and the maximum number of iterations is set to 100. Nodes are decimated if their 
log-likelihood ratios are larger than 2. Suitable irregular degree distributions optimized for the 
AWGN channel are obtained from lfi6l . The results are shown in Fig. |7J It is seen that the 
resulting distortions are close the lower bounds for both the adaptive and non-adaptive actions 
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Fig. 7. Rate-distortion-cost function (lines) compared to the performance of the proposed code design (markers) with both 
adaptive and non-adaptive actions. 



strategies. Moreover, the theoretical gains of the adaptive action strategy versus the non-adaptive 
one are confirmed by the practical implementation. 

VI. Conclusion 

In this paper, we have considered computation of the rate-distortion-code function and code 
design for source coding problems with action-dependent side information. We have formulated 
the problem using Shannon strategies and proposed a BA-type algorithm that efficiently computes 
the rate-distortion function. Convergence of this algorithm was proved. Moreover, we proposed 
a code design based on multiplexing that was shown, via numerical results, to perform close to 
the rate-distortion bound. 



Appendix A 
Proof for Lemma [4] 



The BA-type algorithm detailed in Tables Algorithm [T] and Algorithm [2] is based on alterna- 



tively optimizing F(-) in ( [23] ) with respect to Pt\x, Qa and Qt,y- Given the convexity of this 
function, shown in Proposition [3j this procedure is known to converge ifTTll . The optimization 
with respect Qa for fixed Pt\x and Qa and with respect to Qt,y for fixed Pt\x and Qt,y 
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are performed as in the proof of Proposition [3} Therefore, the proof is concluded once it is 
demonstrated that the procedure of Table Algorithm [2] converges to an optimum Pt\x for fixed 
Qa and Qt,y- This is discussed next. The procedure in Table Algorithm [2] for the optimization 



with respect to Pt\x f° r fixed Qa and Qt,y is based on the dual minimization ( [27] ) via an outer 



loop that performs subgradient iterations and an inner loop that performs fixed-point iterations 



to obtain a stationary point of the Lagrangian function ( [38] ) (see below). We first show that this 
nested loop procedure obtains an optimal dual solution P T \ X of the dual problem and then argue 
that this is also a solution for the original primal problem. 

Convegence of the outer loop follows immediately by the well-known properties of the 
subgradient approach for weights that are selected as 0, = \ IfTTl . Note that the constraints 
1 — J2aeA P^\ a \ x ) for x & X are the subgradients with respect to \ x of the dual function 
given by the minimization in ( |27] ) [fT8ll . Therefore, by defining fi x = —y^j + 2, the updates of 
the variables [i$ in Table Algorithm [2] can be seen to correspond to the classical subgradient 
updates. Given the known convergence properties of the subgradient method with the weights 
as in Table Algorithm [2| the outer maximation converges IfTTl . 

Next, we need to show that we can solve the inner minimization in ( [27] ) by using the fixed- 



point iterations in ( |47] ) (see below). It is first shown that we can solve the minimization problem 
by solving a system of stationarity equations for P(a\x),a G A, x G X. Then, we conclude the 
proof using Banach fixed-point theorem [fT9l . 



The Lagrangian to be minimized is given by (cf. (27)) 



C(Pt\x, {K}) = F(P T \ X , Q A , Qt,y) + J2 X *{j2 p T\x(t\x) - 1 ) . (38) 

xex \teT J 

It is noted that the function C is coercive in Pt\x, an d hence from Weierstrass theorem ||2~0|| a 
minimizer of C exists. The minimizer must be a stationary point, i.e., it must satisfy the KKT 
conditions flU Section 5.5.3]. We obtain the following stationarity conditions by differentiating 
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< |38| ) with respect to P(t\x) and equating to zero, leading to 

log p(t\ x ) + p (y\ x > a W) lo s i p (y> a W] = 

mA(a(*)) + p (y\ x > a (0) x) + log Q(t, y) + log Q(a(t))] + ^, (39) 

log P(t|x) + ^ P(y|x, a(t)) log [P(y, a(*)] = log a tjX + fi x (40) 

y&y 



where a t , x is given in ([29]) and P(y, a) is calculated from the joint pmf in ( |T3| ). We can then 
rewrite ([40]) by applying the exponential function to both sides and solving for P(t\x) 

P{t\x) = p , | m (41) 

Uyey E** *W(a(*) |x)P(y|x, a(t))] P( ^' aW) 



where a t):c is given in (|29j). Note that the right-hand side only depends on Pt\x through Pa\x, 
and hence by computing P{a\x) for a E A and x E X, P{t\x) can be calculated. By summing 



(41 ) over t E T a , we obtain 



= f rr* tpsw (42) 



where a a>x is given in ( pO] ). Given {fi x }, the equalities in ([42]) for a 6 i and x E X form a 
system of |v4||A'| nonlinear equation with |-4||A'| unknowns, namely the values P[a\x) 

for a E A and x E X. By solving for P(a\x), we can compute P{t\x) as in ( |4T| ). Note that the 
constants are sums of exponential functions, and hence P(a\x) in ( [42] ) are strictly positive 
for a e .A and x € X. Now, define 

2^ a x 

h a \ x (P A \x,^x) = — — fr Nl Jv.xA(»l*,a)' (43) 

P^a|x (q, /ix) = log /i(2 q , /i x ) , (44) 
and Gr B | a .(q,/i x ) = log</ a |a.(2 q , /j x ) 

^/3q+(l-/3)H alx (q,fM x ) (45) 
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where q G 'fc'- 4 "'*' and 2 q G T?.^ 4 "*' are the vectors corresponding to the elements q a \ x = 
log P(a\x) and P(a\x), respectively, and (3 G (0, 1). Moreover, let G(q, G T?.'- 4 "*' denote 

the vectors collecting the functions G a \ x for a G A,x G X. With these definitions it is now 



evident that ( |42| ) is equivalent to the following equation 

q a \x = #o[*(q> 0**})- (46) 

We now show that the fixed-point iteration of the form 

q (*+i) = G(q « (47) 



converges towards a fixed-point q*, which is a unique fixed-point of ( |46l ) for any (5 G (0, 1). 

Recall that the existence of a fixed-point q* is guaranteed by the necessity of the KKT 
conditions and by Weierstrass theorem. In the following, we apply Banach fixed-point theorem. 
To this end, we have to demonstrate that there is a closed subset Vt G IRl- 4 '^!, such that the 
vector function G maps from vectors q G Vt into Vt, and is a contraction in Vt. By the existence 
of a fixed-point q*, we define the subset Vt as the closed ball 

{l = B r (cC) = {qGM 1 - 411 * 1 ! ||q-q*IL < r} , (48) 



for some r > | |q(°) — q* 1 1 . In order to show that G maps from f2 into f2 and is a contraction, 
we compute the partial derivatives of H a \ x (q) and G a \ x (q) as following 



and ^H£<3) = OT{fca , „ d + (1 _ fi) d JM^ . (50) 
It is clear that the derivative dH ^ x<yC ^ i s strictly negative for q G IRl- 4 ^' since P(x) > 0, and it 

&Qa\x 

can be seen that 

y g^g(q) = _ L 
an 



a'eA,x'eX 



dq a '\ x ' 
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Therefore, for /3 G (0, 1), we must have that 



dGa\x(<l) 



dq a >\2 



< 1. 



(52) 



It follows that we can bound the /oo-norm of the Jacobian for G(q), Jg(q) 5 as 



l^c(q) 



< 1. 



(53) 



By the definition of the /oo-norm and by the mean value theorem |fl9l . there exist values a G A, 
x G X and ( G (0, 1) such that 



|G(q 1 )-G(q 2 



< llq'-q 2 ! 



E 



^(c^+a-oq 2 ) 

oq a \ x 



<||q 1 — q 2 || maxllJcfq) 
-11^ ^ Moo qgf7 N ^vn/ 



< Kllq^q 2 



(54a) 
(54b) 

(54c) 
(54d) 



for q x ,q 2 G f2, where the last inequality follows by the fact that 1 1 J<3.(q) 1 1 must attain a 
maximum value K < 1 when q G O, since Vt is closed and bounded, by Weierstrass theorem. 



The chain of inequalities in ( [54] ) demonstrates that G is a contraction mapping. To show that G 
maps from Vt into Vt, suppose q G Vt. Since Vt contains the fixed-point q*, it is then seen that 



|G(q)-q* 



|G(q)-G(q* 



< q-q L < r, 



(55) 
(56) 



and hence G(q) G fi. By invoking the Banach fixed-point theorem, the fixed-point iteration 
defined by ( |47] ) converges to a unique fixed-point q*. 

We finally observe that, since the fixed-point is unique, the minimizer of the Lagrangian 
function C is unique, and hence the optimal Pt\x of the primal and the dual optimization 
problem coincide, thus concluding the proof. 
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